26. Quiz - Model Tuning

Let's continue exploring the parameter space for our question classification model.

As a first step break your data set into 90% of training data and set aside 10% to answer what's the accuracy of the best model you trained using unseen data.

On the first 90% of the data let's find the most accurate logistic regression model using 3-fold cross-validation with the following parameter grid:

  • CountVectorizer vocabulary size: [1000, 5000]
  • LogisticRegression regularization parameter: [0.0, 0.1]
  • LogisticRegression max Iteration number: [10]

Set the random seeds of all stages of the pipeline to 42.

What is the accuracy of the best model trained with the parameter grid described above (and keeping all other parameters at their default value computed on the 10% untouched data?

SOLUTION: 0.39